16 research outputs found

    Tidying up international nucleotide sequence databases

    Get PDF
    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi

    PlutoF—a Web Based Workbench for Ecological and Taxonomic Research, with an Online Implementation for Fungal ITS Sequences

    Get PDF
    DNA sequences accumulating in the International Nucleotide Sequence Databases (INSD) form a rich source of information for taxonomic and ecological meta-analyses. However, these databases include many erroneous entries, and the data itself is poorly annotated with metadata, making it difficult to target and extract entries of interest with any degree of precision. Here we describe the web-based workbench PlutoF, which is designed to bridge the gap between the needs of contemporary research in biology and the existing software resources and databases. Built on a relational database, PlutoF allows remote-access rapid submission, retrieval, and analysis of study, specimen, and sequence data in INSD as well as for private datasets though web-based thin clients. In contrast to INSD, PlutoF supports internationally standardized terminology to allow very specific annotation and linking of interacting specimens and species. The sequence analysis module is optimized for identification and analysis of environmental ITS sequences of fungi, but it can be modified to operate on any genetic marker and group of organisms. The workbench is available at http://plutof.ut.ee

    Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

    Get PDF
    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi

    An open source software package for automated extraction of ITS1 and ITS2 from fungal ITS sequences for use in high-throughput community assays and molecular ecology

    No full text
    We introduce an open source software utility to extract the highly variable ITS1 and ITS2 subregions from fungal nuclear ITS sequences, the region of choice for environmental sampling and molecular identification of fungi. Inclusion of parts of the neighbouring, very conserved, ribosomal genes in the sequence identification process regularly leads to distorted results. The utility is available for UNIX-type operating systems, including MacOS X, and processes about 1000 sequences per minute

    An open source chimera checker for the fungal ITS region

    No full text
    The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit holds a central position in the pursuit of the taxonomic affiliation of fungi recovered through environmental sampling. Newly generated fungal ITS sequences are typically compared against the International Nucleotide Sequence Databases for a species or genus name using the sequence similarity software suite blast. Such searches are not without complications however, and one of them is the presence of chimeric entries among the query or reference sequences. Chimeras are artificial sequences, generated unintentionally during the polymerase chain reaction step, that feature sequence data from two (or possibly more) distinct species. Available software solutions for chimera control do not readily target the fungal ITS region, but the present study introduces a blast-based open source software package (available at http://www.emerencia.org/chimerachecker.html) to examine newly generated fungal ITS sequences for the presence of potentially chimeric elements in batch mode. We used the software package on a random set of 12 300 environmental fungal ITS sequences in the public sequence databases and found 1.5% of the entries to be chimeric at the ordinal level after manual verification of the results. The proportion of chimeras in the sequence databases can be hypothesized to increase as emerging sequencing technologies drawing from pooled DNA samples are becoming important tools in molecular ecology research

    An open source chimera checker for the fungal ITS region

    No full text
    The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit holds a central position in the pursuit of the taxonomic affiliation of fungi recovered through environmental sampling. Newly generated fungal ITS sequences are typically compared against the International Nucleotide Sequence Databases for a species or genus name using the sequence similarity software suite blast. Such searches are not without complications however, and one of them is the presence of chimeric entries among the query or reference sequences. Chimeras are artificial sequences, generated unintentionally during the polymerase chain reaction step, that feature sequence data from two (or possibly more) distinct species. Available software solutions for chimera control do not readily target the fungal ITS region, but the present study introduces a blast-based open source software package (available at http://www.emerencia.org/chimerachecker.html) to examine newly generated fungal ITS sequences for the presence of potentially chimeric elements in batch mode. We used the software package on a random set of 12 300 environmental fungal ITS sequences in the public sequence databases and found 1.5% of the entries to be chimeric at the ordinal level after manual verification of the results. The proportion of chimeras in the sequence databases can be hypothesized to increase as emerging sequencing technologies drawing from pooled DNA samples are becoming important tools in molecular ecology research

    A note on the incidence of reverse complementary fungal ITS sequences in the public sequence databases and a software tool for their detection and reorientation

    No full text
    Reverse complementary DNA sequences––sequences that are inadvertently cast backward and in which all purines and pyrimidines are transposed––are not uncommon in sequence databases, where they may introduce noise into sequence-based research. We show that about 1% of the public fungal ITS sequences, the most commonly sequenced genetic marker in mycology, are reverse complementary, and we introduce an open source software solution to automate their detection and reorientation. The MacOSX/Linux/UNIX software operates on public or private datasets of any size, although some 50 base pairs of the 5.8S gene of the ITS region are needed for the analysis

    V-REVCOMP:Automated high-throughput detection of reverse complementary 16S rRNA gene sequences in large environmental and taxonomic datasets

    No full text
    Reverse complementary DNA sequences - sequences that are inadvertently given backwards with all purines and pyrimidines transposed - can affect sequence analysis detrimentally unless taken into account. We present an open-source, high-throughput software tool -v-revcomp - to detect and reorient reverse complementary entries of the small-subunit rRNA (16S) gene from sequencing datasets, particularly from environmental sources. The software supports sequence lengths ranging from full length down to the short reads that are characteristic of next-generation sequencing technologies. We evaluated the reliability of v-revcomp by screening all 406781 16S sequences deposited in release 102 of the curated SILVA database and demonstrated that the tool has a detection accuracy of virtually 100%. We subsequently used v-revcomp to analyse 1171646 16S sequences deposited in the International Nucleotide Sequence Databases and found that about 1% of these user-submitted sequences were reverse complementary. In addition, a nontrivial proportion of the entries were otherwise anomalous, including reverse complementary chimeras, sequences associated with wrong taxa, nonribosomal genes, sequences of poor quality or otherwise erroneous sequences without a reasonable match to any other entry in the database. Thus, v-revcomp is highly efficient in detecting and reorienting reverse complementary 16S sequences of almost any length and can be used to detect various sequence anomalies

    Quality and metadata annotations of fungal ITS sequences by mycorrhiza types.

    No full text
    1<p>all sequences belonging to EcM lineages regardless of isolation source;</p>2<p>all sequences belonging to Glomeromycota, except <i>Geosiphon;</i></p>3<p>all sequences derived from roots (with or without a culturing step) of the respective host plants;</p>4<p>Chimeric sequences consist of two or more fragments of fungal sequences and are therefore not assigned below the kingdom level;</p>5<p>nd, not determined;</p>6<p>Annotated–sum of original annotations and metadata provided in the course of this study;</p>7<p>formation of coils in ErM, formation of pelotons, stimulation of germination or development in OM.</p
    corecore